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Abstract 

The purpose of this paper is to articulate the common misinterpretation of 
correlation for causation. Various articles that have addressed this issue are 
overviewed and possible reasons for the misinterpretation of correlation for 
causation are presented. The differences between correlational and experimental 
research designs are reviewed and the implications of their findings are 
discussed. The discrimination that exists between these two research designs is 
also highlighted in light of their respective abilities to infer correlation or causality. 
The dangers of confusing correlation with causality are discussed and an 
example of a linear regression analysis is used to illustrate how correlation can 
be misinterpreted for causality. 
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Correlation versus Causation: Another look at a common misinterpretation 

One of the fundamental concepts learned in just about any statistics 
course is that correlation does not imply causality; however, the human mind 
seems to be programmed to see causal relationships when there are none 
(Bracy, 1998). The inference of causation from correlation has been an ongoing 
misinterpretation and one that continues to surface in the literature. Over the 
years, a number of authors have articulated the differences between the two 
concepts while others have proposed possible explanations for the 
misinterpretation of these concepts. The purpose of this paper is to highlight the 
misinterpretation of correlation for causation by: 1) discussing the implication of 
the correlation statistic, 2) discussing the differences between correlation and 
research design and their respective abilities to infer correlation or causality, 3) 
highlighting the dangers of confusing correlation with causality and, 4) using an 
example of a linear regression analysis to illustrate how correlation can be 
misinterpreted for causality. 

The Correlation Statistic 

Silvestri (1989) suggested that the term “correlation” is a reference to 
either the correlation statistic or the non-experimental research design. The 
correlation statistic is a measure of the strength and directionality of a 
relationship between variables. It does not infer a cause and effect relationship 
between the variables. Bateson (1995) posited an interesting example. 
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suggesting that it is not justifiable to infer that churches cause crime because 
there is a high positive correlation between the number of churches and the 
crime rate in urban centers in the United States. From the correlation statistic it 
can only be established that a relationship of a specified degree and direction 
exists between the two variables. One cannot make any inferences concerning 
causality based upon the result of the correlation statistic (Onwuegbuzie & 
Daniel, 1999). 

Correlation and Research Design 

According to Silvestri (1989), the research design determines whether 
there exists a causal relationship or a mere association. The experimental design 
is a research protocol that is set up to establish causal inferences between 
variables. In these types of research designs, variables are strictly controlled and 
manipulated in order to infer a causal relationship. In addition, alternative causal 
possibilities must be eliminated before a definite cause is settled upon. The 
satisfaction of these conditions are not theoretically possible but are 
approximated to a substantial degree in practice (Harcum, 1988). 

Correlational, or non-experimental, designs (commonly referred to as 
observational studies) do not allow for causal inference. They merely establish 
relationships that exist between variables and define these relationships in terms 
of their strength and direction. These relationships are usually determined by a 
series of observations that are carried out by the researcher. The observations 
are based upon the “real world” interaction of the variables and not a pre-defined 
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environment in which the variables can be manipulated (Miller, Chaplin, & 
Coombs, 1990). These studies, which typically employ correlational statistics, 
comprise the bulk of the studies that currently exist in the literature. However, 
Silvestri (1990) cautioned that there need not be a predefined relationship 
between the non-experimental design and the use of the correlation statistic. 

The misinterpretation of causation for correlation could possibly be 
attributed to the difference in status between experimental and correlational 
studies. Experimental studies tend to be accredited with greater importance by 
virtue of their sophisticated design, verifiable results, and their ability to infer 
causality. Correlational studies tend to be viewed as second rate because they 
are based upon hypotheses that ultimately need to be verified by an 
experimental design (Miller, Chaplin, & Coombs, 1990). This bias against 
correlational design is unfounded and should not suggest a hierarchical ranking 
of the two types of research designs. Both design types are equally important 
and necessary in order for each other to successfully accomplish their objectives. 
Neither is superior nor could exist independently. However, in certain research 
situations it may be necessary to employ the use of a specific type of research 
design type due to the nature of the variables. This is especially true when it is 
not possible to manipulate or is unethical to manipulate predictor variables under 
study. In such a situation, a correlational study may be the only option available 
to the researcher. Nevertheless, it is important to note that both design types are 
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equally important and necessary in order to explain the existence and interaction 
of variables within a specific environment (Miller, Chaplin, & Coombs, 1990). 

Part of the discrimination between the different research designs can be 
attributed to research journals which tend to favor studies that have been 
conducted via the experimental design (Miller, O’ Bannon, & Melvin, 1980). 
Although most journals will publish any material that warrants recognition, there 
still exists a stigma attached to experimental studies. This is clearly evident from 
the large number of journals that are specifically dedicated to experimental 
research as opposed to other research types. A number of authors have tried to 
address the imbalance that exists between the two, but to no avail. Editors and 
authors need to make a concerted effort not to elevate any one type of design as 
being superior but rather to equate all design types on a neutral platform. 
Experimental and correlational designs are equally effective; they merely differ in 
their objectives and approach. 

Dangers of Confusing Correlation with Causality 

The misinterpretation of causation for correlation can have far reaching 
consequences. Bracy (1998) cited a study in which the College Board 
established a high positive correlation between students who took algebra in 
eighth or ninth grade and those who went to college. This finding was 
misinterpreted by the Secretary of Education who eventually went on to state that 
courses in mathematics including algebra were the gateway to college and future 
employment. Hence, a causal relationship was interpreted from the high positive 
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correlation. The consequences for a misinterpretation such as this can be 
detrimental, in that most lay readers would be deceived into believing the 
statement made by the Secretary of Education. It could result in an unexpectedly 
large number of students taking algebra under the pretext that their subject 
choice will eventually get them into college, an activity that might result in 
misplacement of many students into courses not suited to their needs, interests, 
or developmental level (Bracy, 1998). 

Data Example 

The misinterpretation of correlation for causation occurs fairly frequently 
by both writers and readers of research especially when linear regression 
analysis is part of the study. In linear regression, the researcher is primarily 
concerned with the concept of prediction, which is accomplished in part by 
establishing correlation between variables. In order to demonstrate just how easy 
it is to infer causality from correlation, a regression analysis was conducted on 
data gathered by Holzinger and Swineford (1939). In the Holzinger and 
Swineford study, the researchers collected data from over 20 tests of ability that 
were administered to a sample of middle school students. The purpose of their 
study was to establish how scores from different tests were related to each other 
in order to determine different groupings of abilities that accounted for overall 
academic performance. 

The present regression analysis conducted on the Holzinger and 
Swineford data was an attempt that was made to determine whether students’ 
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scores on their general comprehension test (variable T6) were related to their 
scores on the general information verbal test (T5), the sentence completion test 
(T7), the word classification test (T8) and the word meaning test (T9). Hence, the 
goal of the analysis was to see if the dependent variable (T6) could be accurately 
predicted from the predictor variables (T5, T7, T8, and T9). The analysis included 
the scores of 301, students and the null hypothesis was that there would be no 
statistically significant (p = .05) correlation between the set of predictor variables 
and the dependent variable (Ho: R ts, T7, t8, t9 * t6 = 0). 



Insert Table 1 here 



As shown in Table 1 results indicate that statistical significance was found 
at the .001 level thereby rejecting the null hypothesis, the implication being that 
the predictor variables and the dependent variables were related. The extent of 
this relationship was evident from the large effect size. The calculated R Square 
value was .612 thereby suggesting a strong correlation with 61% of the 
dependent variable variance explained. 

Having successfully established a strong relationship between the 
predictor variables and the dependent variable, an analysis of the regression 
structure coefficients was conducted. Results as outlined in Table 2 indicate that 
the correlation between the predictor variables and the dependent variable range 



ERIC 



9 



Correlation versus Causation 9 



from a strong positive correlation (.74) to a high positive correlation (.94). 



Insert Table 2 here 



From these results we can conclude that not only is there a strong 
correlation between the predictor variables and the dependent variable, but also 
that all four of the predictor variables are important in predicting the dependent 
variable. Hence, it has been established that general verbal information (T5), 
sentence completion (T7), word classification (T8), and word meaning (T9) 
abilities are strong predictors of student paragraph comprehension (T6). At this 
point it would be very easy to infer causality by suggesting that the four-predictor 
variables cause students to better comprehend paragraphs. However we cannot 
do this. Our interpretation would have to stop at the point of establishing a 
positive relationship. We cannot say for certain that these predictor variables 
cause students to better comprehend paragraphs. An additional variable, which 
is not a part of this study, could possibly be the cause. 

Conclusion 

If the cycle of misinterpretation of the meaning of correlation is to be 
broken, then authors need to be the first to make clear the distinction between 
correlation and causation. If they are unclear on the implications of their 
research, can we expect the non-technical reader to make appropriate 
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conclusions? Inconsistencies in interpretation and presentation of results must be 
corrected before researchers present their work on any type of platform. This will 
enhance the readers’ interpretation of one’s work and promote the quest for truth. 
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Table 1 

Sum of Square Breakdown from Regression Analysis 



Model 


Sum of 
Squares 


df 


Mean 

Square 


F 


Sig. 


1 Kegression 


2238.452 


4 


559.613 







Residual 


1420.498 


296 


4.799 






Total 


3658.950 


300 









3- Predictors: (Constant), T9, T8, T5, T7 
Dependent Variable: T6 



Table 2 

Regression Structure Coefficients 





Unstandardized 
Predicted Value 


unstandaraizea 


Kearson correlation 


TTJOir 


Predicted Value 


Sig. (2-tailed) 






N 


301 


T5 


Pearson Correlation 


.840*’ 




Sig. (2-tailed) 


.000 




N 


301 


17 


Pearson Correlation 


.937*’ 




Sig. (2-tailed) 


.000 




N 


301 


T8 


Pearson Correlation 


.744*’ 




Sig. (2-taited) 


.000 




N 


301 


Td 


Pearson Correlation 


.901*’ 




Sig. (2-tailed) 


.000 




N 


301 



**• Correlation is significant at the 0.01 level (2-tailed). 
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